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Abstract — This paper studies the properties of -analysis 
regularization for the resolution of linear inverse problems. 
Most previous works consider sparse synthesis priors where the 
sparsity is measured as the norm of the coefficients that 
synthesize the signal in a given dictionary. In contrast, the more 
general analysis regularization minimizes the £^ norm of the 
correlations between the signal and the atoms in the dictionary. 
The corresponding variational problem includes several well- 
known regularizations such as the discrete total variation and 
the fused lasso. 

We give a sufficient condition to ensure that a signal is the 
unique solution of the analysis regularization when there is 
no noise in the observations. The same criterion ensures the 
robustness of the sparse analysis solution to a small noise in the 
observations. We also define a stronger sufficient condition that 
ensures robustness to an arbitrary bounded noise. In the special 
case of synthesis regularization, our contributions recover already 
known results, that are hence generalized to the analysis setting. 
We illustrate these theoritical results on practical examples to 
study the robustness of the total variation and the fused lasso 
regularizations. 

Index Terms — sparsity, analysis regularization, synthesis regu- 
larization, inverse problems, £^ minimization, union of subspaces, 
noise robustness, total variation, wavelets, Fused Lasso. 

I. Introduction 

A. Inverse Problems and Signal Priors 

This paper considers the stability of inverse problems regu- 
larization using sparse priors. Many data acquisition systems 
are modeled using a linear mapping of some unknown source 
perturbed by an additive noise. This reads 

y = $a:o+w, (1) 

where y g M*^ are the observations, xq G the unknown 
signal to recover, w the noise and $ a linear operator which 
maps the signal domain into the observation domain 
where Q ^ N. The mapping $ is in general ill-conditioned, 
which makes the recovery of an approximation of a;o difficult, 
see for instance (f] for an introduction to inverse problems. 

Regularization through variational analysis is a popular way 
to compute an approximation of xa from the measurements y 
as defined in ([T]i- The general framework reads 



a:£R" Z 



(2) 



This requires to define a prior R to enforce some regularity on 
the recovered signal. We restrict our attention in this paper to a 
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£^ fidelity measure — <&a;||2 that reflects some Gaussian prior 
on the noise w. The regularization parameter A > should be 
adapted to match the noise level and the expected regularity 
of the data xq. 

For noiseless observations, w — 0, one has to take the limit 
A ^ and solve the constrained problem 



min R(x) subject to <l>a; = v. 



(3) 



A popular class of priors are quadratic Hilbert norms 
of the form R{x) = {x, Kx) where K is some positive 
definite kernel. The minimizations (|2]l and (|3]l correspond to 
a Tikhonov regularization which typically enforces some kind 
of uniform smoothness in the recovered data. More advanced 
priors rely on non-quadratic functionals which enforce sparsity 
of the signal over some transformed domain (e.g. its wavelet 
transform or its gradient). These sparse priors are the subject 
of this article, and are described in the following section. 



B. Notations 

Our paper focus on real vector spaces. In all the following, 
the variable x will denote a vector in M^, y will be a vector 
in and a a vector in M^. 

The sign vector sign(a) of a is 



Vfc e {I,--- ,P}, sign(a)fe = 



+1 


if 


ak > 0, 





if 


ak = 0, 


-1 


if 


ak < 0. 



The support of a e is 

supp(a) = {i e {1, • • • , P} \ a, 7^ 0} . 

For a set /, |/| will denote the cardinal of /. 

In the following we make use of the matrix norms. The 
p, g-operator norm of a matrix M is 



ll^ll 



max 



\\ Mxl 



The matrix Mj for J a subset of {1, . . . , P} is the subma- 
trix whose columns are indexed by J. Similarly, the vector sj 
is the reduced dimensional vector built upon the components 
of s indexed by J. 

The matrix Id is the identity matrix, where the underlying 
space is implicited. For any matrix M, Af+ is the Moore- 
Penrose pseudoinverse of M and M* is the adjoint matrix of 
M. 
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C. Synthesis and Analysis Sparsity 

a) Synthesis sparsity: Sparse regularization is a popular 
class of priors to model natural signals and images, see for 
instance ||2|]. In its simplest form, the sparsity of coefficients 
a G is measured using the f' pseudo-norm 

Ro{a) = ||q:||o = |supp(a)|. 

Minimizing (|2]l or (|3]l with R = Rq is however known to be in 
some sense NP-hard, see for instance [3]. Several workarounds 
have been proposed to alleviate this difficulty. A first class 
of methods uses greedy algorithms The most popular 
algortihms are Matching Pursuit |5] and Orthogonal Matching 
Pursuit iSlrtl- A second class of methods, which is the focus 
of this paper, replaces the pseudo-norm by its £^ convex 
relaxation [8]. 

A dictionary D = (<ii)|Li ^ (possibly redundant) collec- 
tion of P atoms di G M^. It can also be viewed as a linear 
mapping from to which is used to synthesize a signal 

X G Span(L>) C as 



which leads to the following minimization problem 



Da 



p 



i . 



In the redundant case (P > N) this decomposition is non- 
unique. The sparsest set of coefficients, according to the £^ 
norm, defines a prior 

Rs{x) = min 11 a 11 1 subject to x — Da. 

Any solution x of (|2|i using R = Rs can be written as x — Da 
where a is a solution of 



min l\\y-^a\\l + X\\a\\i, 



(4) 



where 5* = ^D, and x = Da. It was first introduced 
in 10] in the statistical community and coined Lasso. It is also 
known in the signal processing community as Basis Pursuit 
Denoising |10]. Such problem corresponds to a so-called 
synthesis regularization because one assumes the sparsity of 
the coefficients a that synthesize the signal x — Da. In the 
noiseless case, w = 0, one uses the constraint optimization 
(O, which reads 



min II a 111 



subject to y = ^'a, 



(5) 



and is referred to as Basis Pursuit |10]. Taking D = Id to be 
the identity imposes sparsity of the signal itself, and is used 
for instance for sparse spikes deconvolution in seismic imag- 
ing 111]. Sparsity in orthogonal as well as redundant wavelet 
dictionaries are popular to model natural signals and images 
that exhibit sharp transitions Beside the regularization of 
inverse problems, a popular application of sparsity is blind 
source separation ||121 . 

b) Analysis sparsity: Analysis regularization corresponds 
to using R = Ra in Q where 

p 

Ra{x) = ||i?*x||i ==^|(d„ x)\ 



■ 1|| 
mm — 1/ 



$a;|| 



A\D*x\\ 



(Vxiy)) 



As the objective in CPxiy) i is proper, continuous and convex, it 
is a classical existence result that the set of (global) minimizers 
is nonempty and compact if and only if 



Ker $ n Ker D* 



{0}. 



(Ho) 



All throughout this paper, we suppose that this condition holds. 

is in some sense more 



Note that the analysis problem CPx{y) 
general than the synthesis one (|4]i because the last one is 
recovered by setting D = ld and = $. 

In the noiseless case, w — 0, one uses the constrained 
optimization which reads 



min ||Z?*a;||i subject to $a; 



y- 



CPoiy)) 



The most popular analysis sparse regularization is the total 
variation, which was first introduced for denoising in ifisll . It 
corresponds to using a derivative operator D*. In the case of 
1-D discrete signals, one can use forward finite differences 
D = Du]F where 

/-I \ 

-1 -1 



Dr 



-1 



V 







(6) 



-V 



The corresponding prior Ra favors piecewise constant signals 
and images. A review of total variation regularization can be 
found in llTill . 

The theoretical properties of total variation for denoising 
has been extensively studied. A distinctive feature of this 
regularization is that it tends to produces a staircasing effect, 
where discontinuities not present in the original data might be 
created by the regularization. This effect has been studied by 
Nikolova in f\3\ in 2-D. The stability of discontinuities for 
2-D total variation denoising is the core of the work of flil] . 
Section IIV-CI shows how our results also shed some light on 
this staircasing effect for 1-D signals. 

It is also possible to use a dictionary D of translation 
invariant wavelets, so that the corresponding prior Ra can 
be interpreted as a sort of multi-scale total variation. Such a 
prior tends to favors piecewise regular signals and images. An 
extensive study of these redundant dictionaries highlighting 
differences between synthesis and analysis is done in |17]. 

As a last example of sparse analysis regularization, let us 
mention the Fused lasso jlSll . where D is the concatenation 
of a discrete derivative and a weighted identity. The corre- 
sponding prior Ra encourages both sparsity of the signal and 
its derivative, hence grouping block of non-zero coefficients 
together 

c) Synthesis versus analysis.: In a synthesis prior, the 
generative vector a is sparse in the dictionary D whereas 
in analysis prior, the correlation between the signal x and 
the dictionary D is sparse. When D is orthogonal, Vx{y) 
and Lasso define the same regularization. As highlighted 
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in ifigll synthesis and analysis regularizations however differ 
significantly when D is redundant. Some connections between 
total variation regularization and wavelet sparsity have been 
drawn in 12011. 



D. Union of Subspaces Model 

Analysis regularization favors the sparsity of D*x. It is thus 
natural to keep track of the support of this correlation vector, 
as done in the following definition. 

Definition 1. The £)-support I of a vector x G M.^ is defined 
as I — supp(£'*a;). Its Z?-cosupportJ is defined as J — 

A signal x such that D*x is sparse lives in a cospace Qj 
of small dimension where Qj is defined as follow. 

Definition 2. Given a dictionary D, and J a subset of 
{1 • • • P}, the cospace Qj is defined as 

where Dj is the subdictionary whose columns are indexed by 
J. 

The signal space can thus be decomposed as a union of 
subspaces of increasing dimensions 

ke{0,...,N} 



where 



{!,..., P} and dim gj-fc}. (7) 



The union of subspaces associated to synthesis regulariza- 
tion {D — Id) defines as the set of axis-aligned subspaces 
of dimension k. For the 1-D total variation prior, where 
D = Ddif as defined in (|6]l, 9fe is the set of piecewise constant 
signals with fc — 1 steps. A detailed analysis of several sparse 
analysis subspaces, including translation invariant wavelets, 
can be found in [21]. 

More general unions of subspaces (not necessarily corre- 
sponding to analysis regularizations) have been introduced in 
sampling theory to model various kind of non-linear signal 
ensembles, see for instance i22ll . Union of subspaces models 
have been extensively studied for the recovery from pointwise 
sampling measurements 1.22,1 and random measurements 1123 , 
2il2ii26tl. 



E. Organization of this Paper 

Section details our three contributions. Section Hill draws 
some connexions with relevant previous works. Section |IV] 
illustrates our results using concrete examples. SectionlVl gives 
the proofs of the three contributions. 

II. Contributions 
This paper proves the following three results: 

1) Robustness to small noise: we give a sufficient condi- 
tion on xq ensuring that the solution of V\{y) is close 
to Xq when w is small enough. 



2) Noiseless identifiability: the same condition ensures 
that a;o is the unique solution of 7'o(j/) when w = Q. 

3) Robustness to bounded noise: we give a sufficient 
condition on the _D-cosupport of xq ensuring that the 
solution of Vx{y) is close to xq when w is an arbitrary 
bounded noise and A is large enough. 

Each contribution is rigorously described in the following sub- 
sections. 

Note that these contributions extend previously known re- 
sults in the synthesis case, see for instance |27L 
With the notable exception of the work of 1211 13211 that studies 
analysis identifiability, to the best of our knowledge, it is the 
first time these questions are addressed in the analysis case. 

For some cosuport J, it is important to ensure the invert- 
ibility of $ on Qj. This is achieved by imposing 



Ker$ng,/ = {0}. 



iHj) 



Definition 3. Let J be a D-cosupport. Suppose that \Hj) 
holds. We define the operator A^"^^ as 

= U {U*^*^Uy^ U*. (8) 

where U is a matrix which columns form a basis of Qj. 

A. Robustness to Small Noise 

Our next contribution shows that analysis regularization is 
robust to a small noise under a condition on sign(_D*a;o)- 

Definition 4. Let s e { — 1,0,+!}^, / its D-support and 
J its D-cosupport. We suppose ^Hj) holds. The analysis 
Identifiabiltiy Criterion IC of s is defined as 

IC(s) — min Si — U oo 

where 

1]M = -Id)D/. 
We have the following theorem. 

Theorem 1. Let xq G be a fixed vector of D-cosupport 
J, and of D-support I = J'^. Suppose ^Hj) holds and 
IC(sign(Z?*2;o)) < 1. There exist two constants cj > and 
cj > 0, such that if y = ^xq + w, where 

Wwh ^ Cj . in* I 

< — and 1 — mm \D,XQ\i, 

T cj ie{i,-,|/|} 

and if A satisfies 

cj\\w\\2 < A < Tcj, 

the vector defined by 

X* =xq + A^-'^'P*w - XA^^Djsi, (9) 

is the unique solution ofV\{y). Moreover, 

X* G Q.J and sign(£>*xo) = sign{D*x*). 

Note that it is possible to choose A proportional to the noise 
level II 1012. Hence, for 11012 small enough, equation ^ gives 

||i*~a;o|| =0(||«;||2). 
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B. Noiseless Identifiability 

In the noiseless case, w — Q, the criterion IC can be used 
to test identifiabihty. 

Definition 5. A vector xq is said to be identifiable ;/ xq is the 

unique solution 0/7^0(^x0). 

We prove the following theorem 

Theorem 2. Let xq G be a fixed vector of D-cosupport 
J. Suppose that ^Hj) holds and IC(sign(_D*Xo)) < 1. Then 
Xq is identifiable. 

C. Robustness to Bounded Noise 

Our last contribution defines a stronger criterion that ensures 
robustness to an arbitrary bounded noise. 

Definition 6. The analysis Recovery Criterion (RC) of I C 
{1 . . . P} is defined as 

RC(/) = max min llill'-'lp/ — uloo- 
lb/llcc^i ueKci-Dj 

Note that if / is the ZJ-support of xq, RC(/) < 1 implies 
IC(sign(i5*xo)) < 1. 

The following theorem shows that if the parameter A is 
big enough, then V\{y) recovers a unique vector which is 
close enough in the sense and lives in the same Qj as the 
unknown signal Xq. 

Tlieorem 3. Let I be a fixed D-support and J its associated 
D-cosupport J — I'^. Suppose that ^Hj\ holds. 7/'RC(/) < 1 
and 

A-HHl2 ^_^C(^) -r/z p>l, 
where cj is defined as 

CJ= II i? + <i>*(a>AW$*-Id)||2,oo, 

then for every xq of D-support I, there exists a unique solution 
X* of D-support included in I, such that \\x()—x*\\2 = 0(||it;||2) 
. More precisely. 



Fuchs shows the following result. 



X0-X*h < II^''"||2,2||W'||2 ( i*||2.2 



1 - RC(/) 



\\Di\ 



2,00 



III. Related Works 

A. Previous Works on Synthesis Identifiability and Robustness 

Several previous works have studied identifiability and noise 
robustness of sparse synthesis regularization. We recall that 
synthesis regularization (HI reads 

"^i^p ^lly-^"ll2 + A||a||i, 



where 5* = and x = Da. Fuchs defines II28I1 a criterion 
IC5 which is a specialization of our criterion IC introduced 
in Definition |4] to the case where D = Id. 

Definition 7. Let s G { — 1,0, +1}^, / its support and J its 
cosupport. We suppose has full rank. The Sign Criterion 
IC5 of a sign vector s associated to a support I is defined as 

ICs{s) = ||r2^s/||oo where = 



Tlieorem ( 12811 ). Let ao € be a fixed vector of support 
I. If ^ I has full rank and IC5(sign(Q;o)) < 1, then ao is 
identifiable, i.e it is the unique solution of ^ for y — ^'ao- 

The work of Tropp 1291, [30I1 developed in the synthesis case 
a condition named Exact Recovery Condition (ERC) on the 
support. 

Definition 8. The Exact Recovery Condition (ERC) of I C 
{1 . . . P} is defined as 

ERC(/) = ||17^|U,oo, 

Tropp proves that ERC(/) < 1 is a sufficient condition of 
identifiability and stability of the synthesis Lasso. 



Tlieorem (|29l]). Let I be a fixed support. Suppose that has 
full rank. //ERC(/) < 1 and A large enough, then for every 
ao of support I, there exists a unique solution a* of (|4| for 
y — "^ao + w of support included in I, verifying \\ao — a*! 2 = 

oiWwh). 

Note that IC5(s) depends both on the sign and the support, 
while ERC depends only on the support, and we have the 
general inequahty IC5(s) < ERC(/). 

In the analysis case where D = Id, the criterion of Tropp 
and our are equivalent. This is also true for the criterion of 
Fuchs and our 

Proposition 1. If D = Id, then ERC(/) = RC(/) and 

IC(signp*Xo)) = ICs(signp*a:;o)). 

Let us mention that there exist several other criteria ensuring 
both identifiability and noise robustness in the synthesis cases. 
This includes criteria based on coherence (see 113 311 for a re- 
view) and RIP-based compressed sensing theory that requires 
that $ is a realization of certain random matrices ensembles 



B. Previous Works on Analysis Identifiability and Robustness 

To the best of our knowledge, the only previous works that 
study the performance of sparse analysis regularization are the 
papers |32| and ll2lll . 

The work 1I32I1 proves a strong robustness to noise with 
overwhelming probability on the matrix $ when D is tight 
frame and $ a realization of certain random matrices ensem- 
bles satisfying a condition named D-RIR This setting is thus 
quite far from our. 

The work of Nam and al. is much closer to our results. It 
studies noiseless identifiability using £^ and £^ sparse analysis 
regularization. Their main result on £^ analysis identifiability 
is the following theorem. 



Theorem ( 112 III ). Let M* be a basis matrix of Ker $ and I 
a fixed D-support such that the matrix D*jM* has full rank. 
Let X{) G Qj be a fixed vector. //■ ICo(sign(£)*a;o)) < 1 <^nd 



ICo(s) = Ip/s/ll, 
then Xq is identifiable. 



where 



{MD.,)+MDi, 
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Note that ICo(s) < 1 does not imply IC(s) < 1 neither 
the opposite. Numerical results suggest that their criterion is 
most of the time sharper than IC. However, the condition 
ICo(s) < 1 does not imply in general a robustness to noise, 
even for a small one. Moreover, let xq be a fixed vector, and 
denote s ~ sign(Z?*xo) where / is its D-support and y — 
^xo + w. If ICo(s) < 1 but IC(s) > 1, then any solution x* 
of V\{y), for A close to zero, is such that the Z)-support of 
x\{y) is not included in /. One can thus find vectors xq with 
ICo(s) < 1 but where is arbitrary large, whatever 

the amplitude ||?i;||2 of the noise. 

IV. Examples 

This section details algorithms to compute identifiability 
criteria IC and RC, together with a study of total variation, 
shift invariant Haar transform and Fused Lasso regularizations. 

A. Computing Sparse Analysis Regularization 

It is not the focus of this paper to give a full study of 
optimization schemes that can be used to solve the analysis 
regularization. 



In the case where $ = Id (denoising), V\{y) is strictly 
convex, and one can compute its unique solution x* by solving 
an equivalent dual problem |.35i1 



y + Da* where a* G argmin \\y 

l|a||ooS;A 



Dal 



In the general case, it is possible to use a primal-dual method 
such as the algorithm of Chambolle and Pock [36.1 . One way 
is to rewrite the optimization problem as follow 

" F{g,u) = l\\y-g\\l + X\\u\\i 
K{x) ^ {<^x,D*x). 



min F(K(x)) 



where 



B. Computing the Criteria 

In the case where Ker(£'j) ^ {0}, computing 
IC(sign(D*xo)) necessitates the resolution of a convex prob- 
lem. This optimization is re-written as 

IC(sign(i:'*a;o)) = min ||il['^lsign(L»*a;o) , - ulloo 

where iKcr(D/) is characteristic function of Ker(_Dj) 

''O ifueKerpj) 



'-Kor(D.7)('") 



-oo 



else. 



This requires the optimization of a sum of two simple 
functions, i.e. function whose proximal operators is easy to 
compute. The proximal operator Prox^ of a convex lower 
semicontinuous function / is defined as 



Prox/(a;) = argmin —\z — x\\ + f{z). 



Such a minimization can hence be achieved using the Douglas- 
Rachford splitting algorithm BTIl . Indeed, the proximity oper- 
ator of tKor(n7) is the orthogonal projector on Ker(_Dj), and 
the proximal operator of || • ||oo can be computed as 



Prox, 



■7II-1I. 



where is the projection of the ball 

I a; £ \ 11x11 1 ^ l}. This projection is computed as 
explained for instance in ifssll . 

Unfortunately, computing RC necessitates to solve a com- 
binatorial optimization problem which is not convex. Recall 
that 

RC(/) = max min ||ri['^l»/ — uIoo- 
IIp/II^os;! ueKcxDj 

A stronger criterion, which is easy to compute, is obtained by 
selecting w = in Ker Dj 

wRC(/) = ||r!W|U,oc. 

Note that for every vector xq with Z3-support / = supp(£'*a;), 
we have the following inequalities 

IC(sign(i:>*a;o)) =^ RC(/) ^ wRC(/). 



C. Total Variation Denoising 

Discrete total variation uses D = D^if defined in (|6]l. 
We recall that the total variation union of subspace model is 
formed by IJ^. 9^ where 6^. is the set of piecewise constant 
signals with fc — 1 steps. We now define a subclass of piecewise 
constant signals. 

Definition 9. A signal is said to contain a staircase sub-signal 
if there exists — 1} such that 



sign{D*jx)i = sign{D*jx] 



i+l 



±1. 



Figure [T] shows examples of signals with and without 
staircase sub-signals. 




+1 



Fig. 1: Top line: Signals x with 2 discontinuities. Bottom line: 
Associated dual vector m. 

The following proposition studies the robustness of total 
variation denoising. 

Proposition 2. We consider the denoising case, $ = 
Id. If X does not contain a staircase sub-signal, then 
IC(sign(i:)*x)) < 1. Otherwise, IC(sign(D*a;)) = 1. 

Proof: Let x* be a solution of Vx (y) with Z?-cosupport J 
and I = J'^. Using Lemma [T] there exists a G lly,x{x*) such 
that ||(t||oo 1. Since L'+Al-'l = 0, we have = -D+Dj. 
We denote the vector m defined as 

mi ^ sj = sign{D*x)i 
mj — a — fl^'^\sj. 
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The vector a satisfies {D*jDj)a = {D*jDi)sj. One can show 
that this implies that m is the solution of a discrete Poisson 
equation 



Vj G J, (Am)j = and 



mo — TUN — 0. 



where A = DD* is a discrete Laplacian operator. This implies 
that for ii < k < 12 where 11,12 are consecutive indexes of 
/, m is obtained by linearly interpolating (see Figure ^ the 
i.e 



values and to 



TOfc = prrii^ + (1 — p)mi^ where p = 



^2 - H 

Hence, if x does not contain a staircase sub-signal, one has 
||f^'"^'s/||oo < 1- On the contrary, if there is ii such that Sj^ ~ 
Si2, where ^1,12 are consecutive indexes of /, then for every 
ii < j < 12, rrij — Si-^ = ±1 which implies IC(sign(Z?*a;)) = 
1. ■ 
This proposition together with Theorem [T] shows that if 
a signal does not have a staircase sub-signal, TV denoising 
is robust to a small noise. This means that if w is small 
enough, for A small, the TV denoising of xq+w has the same 
discontinuities as xq. However, the presence of a staircase in 
a signal implies that no robustness, even for a small one, can 
be ensured. 



Corollary 1. If\I\ > 2 such that i & I implies i - 
D = Ddif, then RC(/) = 1. 



10/, and 



Proof: If |/| > 2, there exists a signal x which contain 
a staircase sub-signal, hence 1 = IC(sign(Z3*x)) ^ RC(/). 
Since there is no signal x e such that IC(sign(Z3*x)) > 
1, we conclude. ■ 
This corollary shows that in the case of total variation 
regularization, one cannot expect cospace robustness, i.e dis- 
continuities conservation, even for a small noise. 



signals. More precisely, we consider the collection of box 
signals 

I elsewhere. 

Figure|2]displays the average and standard deviation of IC for 
three different values of 77 as a function of Q/N € [0.4, 1]. 
They are estimated numerically using Monte-Carlo simulation, 
using 1000 samples for each redundancy. Remark that IC 
increases when the signal converges to a single spike signal 
and the redundancy Q/N diminishes. 
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Fig. 2: Evolution of IC for a compressed sensing matrix with 
a invariant Haar dictionary. On the left side, a box signal. 
On the right, the dotted line represents the average IC as a 
function of Q/N. The vertical lines represents the interval 
[mean(IC) - std(IC), mcan(IC) + std(IC)]. The horizontal 
line indicates the saturation level IC = 1. 



D. Invariant Haar Transform 

Sparse analysis regularization using a shift invariant Haar 
dictionary is efficient to recover piecewise constant signals. 
This dictionary is defined using a set of dilated Haar filters 

{+1 if i < 2^ 
-1 if - 2^ z < 
else. 

We define the translation invariant Haar dictionary as 

D*rfX = ( V'^^' ★ X) 

\ /0^j<log2(Ar) 

The analysis regularization ||Z?|ja;||i is a sum of the TV norm 
of filtered versions of the signal, it can thus be understood 
as some kind of multi-scale total variation. We consider the 
case where $ is a realization of the Gaussian matrix ensemble, 
which has i.i.d. entries distributed according to the normal law 
7\A(0, 1). Figure ID shows the evolution of IC as a function 
of the redundancy Q/N of the operator $ for different box 



E. Fused Lasso 

Fused Lasso is introduced in ifisl] . It is equivalent to Vx{y) 
when using 

D = [Ddif eld] , 

where e is a positive real number. The associated union of 
subspaces (|7]l is IJ^ Qk where 8fc is the set of sum of k 
interval indicators, i.e a signal x € Qk can be written as 

fc 

1=1 

where 7^ G M and ai ^ bi < a^+i. 

We consider the case where $ is a realization of the 
Gaussian matrix ensemble, which has i.i.d. entries distributed 
according to the normal law JV{0, 1). We consider the collec- 
tion of sum of two indicators 

^'):P = Hi-n-p)N,(i-p)N] + l[(i+p)7V,(i+,,+p)JV]- (10) 

We fixed p = ^-A^, e = Figure [3] shows the evolution 

of the mean and standard deviation of IC as a function of 
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the redundancy Q/N G [0.5, 1] of $ for different box signals. 
They are estimated numerically using Monte-Carlo simulation, 
using 1000 samples for each redundancy. Remark that IC 
diminished when the signal converges to two spikes and when 
the redundancy Q/N increases. An other choice of e may lead 
to different results depending if D favors the ^^-sparsity or the 
total variation sparsity. 
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Fig. 3: Evolution of IC for a compressed sensing matrix with a 
Fused Lasso dictionary. On the left side, a signal with a fixed 
interval size rj = O.G25iV, 0.12507V, 0.2iV. On the right, the 
average and the standard deviation of IC as a function of the 
redundancy Q/N of the random matrix. 



V. Proofs 

This section details the proofs of Theorems [U - [3] The 
objective function Cy,\ minimized in V\{y) is 

'^vA^)^\\\y-^x\l + \\\D*x\\^. 



We recall that we suppose that condition \Ho) holds in every 
statements. The following lemma, which is at the heart of the 
proofs of our contributions, details the first order optimality 
conditions for the analysis variational problem Vxiy). 

Lemma 1. A vector x* is a solution ofV\{y) if, and only if, 
there exists a G M'"^!, where J is the D-cosupport of x*, such 
that 

aeJ^yMx*) (11) 
where I = J'^ the D-support, 

Sy.A(x*) = {aeMl'^l\$*($x*-y) 

+ \Disi + XDjG = (12) 
and ||ct||oo < l| 

and s — sign(Z?*x*). 

Proof The subdifferential dF of a real valued convex 
lower semicontinuous function F : — > K is the multifunc- 
tion defined by 

dF{xo) = {g e R^\Va; e R^, /(x) ^/(xo) + (5, x - x^)} . 



Note that xo is a minimum of F if, and only if, G dF{xo). 
Indeed, if £ aF(a;o), then for every a; S R^,F{x) ^ F{xa), 
meaning that xo is a minimum of F over R^. The subdiffer- 
ential of Cy^\{x) is 

dCy^xix) = -y) + XDu \ue U{x)} , 

where 

U{x) = {u e R^ \ u/ = sign(i:>*x)/ and 



II "J II 



pN 



^1} 

Hence e dCy^\{x) is equivalent to the existence of w G I 
such that ui = sign{D*x)i and IujIoc ^ 1 satisfiyng 

$*($x-y) + AL>w = 0. 

Defining a = uj, it is equivalent to the existence of cr £ 
^y,\{x) with ||a-||oc ^1. ■ 
The following lemma characterizes the normal cone at zero 
of the subdifferential of Cy^x at a minimizer. 

Lemma 2. Let x* a solution of Vx{y) of D-support I*. 
Suppose there exist J C and a G T,y_x{x*) with 

\\o-j\\oo < 1- Then, 

A/'a£„,,(x*)(0) C {lmDj)^=gj. 

where Mqc a(k*)(0) ^'^^ normal cone at zero of the subd- 
ifferential of Cy.x in X* defined by 

^fdc,M-^M = e K"^ \ Vd e dCy^xix*), {z, d)^0}. 

Moreover, if J is the D-cosupport of x*, then 

Proof: Let I = J'^. We decompose I such that / = /* U 
J*. Since |crj||oo < 1, one remarks that u defined by 

uj* — sign(£'j,x)/* 

^uj = aj 
< 1 and 



is such that 



^*{<^>x* -y) + XDu = 0. 



We introduce e > such that ||crj||oo = 1 — e. Consider the 
set 

U — {u £ R^ \ \\uj ~ ujWoo ^ £ and ui = ui] . 
For every u £U, we define 

du = ^*{^x -y) + XDu, 

and we denote 

— {du}ueu- 

Remark that 

du = XD{u — -S) = XDj{ui — ui) + XDj{uj — uj). 
Since uj — uj, one has 

du = XDj{u,j - uj). 
Let z G J^dCy and let ueU. Note that 

||uj||oo S$ \u,J - MjIoo + ||wj||oo 1, 
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and 

ui* = sign{Dj,x)i* and ||m,/*||oo =^ 1- 
Hence, du G dCy,x{x*). By definition, 

Particulary, 

Vii e u, (z, du) 0. 

Remark that for every u E U, one has 2u — u £ U and also 
d2u-u = -du- Indeed, 

d2u-u = ~y) + XD{2u ~ u) 

= -y) + XDu -XD{u - u) 



= -du- 

Moreover, 

||(2?2 - u)j - UjIoo = 1" - U./loo ^ 

and (2u — it)/ — 2ui — uj = uj. Hence 
Vu e Z//, (z, d„) < and {z, -du) = {z, d2u-u) ^ 0. 
Therefore, 

Vii e (z, du) = 0. 

Let €E ImDj \ {0}. Remark there exists fiy € M* such 
that 

fiyV = Djay and ||o-i,||oo e, 
We define then the vector u as 

J W/ = U/ 

w : 

l^uj = uj + ay. 

Note that w is an element of W since uj||oo — \Wv\\oo ^ £- 
Therefore, 

du = \D,j{uj - uj) = XDjGy = — w, 

is such that (z, — and (i„ G V, i.e ImDj = Span(2?). 
Finally, 

We conclude that Nqc a(2;*)(0) is included in ilnvDj)^ = 
Gj- 

Suppose now that J is the Z?-cosupport of x*- We prove 
that 

Remark that dCy^\{x*) C ImDj. Indeed, let d G dCy,\{x*)- 
We write d = - y) + XDju with uj = sign(£'Ja;)/ 

and ||uj||oo =^ 1- Since G dCy,\{x*), one has 

d — XD{u — u), 

and since uj — uj, one has 

d = XDj{uj - uj). 

Hence, (ImZ?j)^ is included in the normal cone 



The following lemma gives a sufficient condition to guar- 
antee the uniqueness of the solution of 'Px{y)- 

Lemma 3. Let x* be a vector of D-support I*. Suppose there 
exist a G RI'^*'"! and J C such that ^Hj) holds, 

a G T,y^x{x*) and \\<Jj\\oo < 1- 

Then, x* is the unique solution of'P\{y). 

Proof: We decompose Cy^\ in two functions: 

'Cy,A(a;) = q{x) + X\D*x\i where q{x) = ^11?/ " 

Let h G \ {0}. Two different cases occur: 

1) If h ^ Qj, then using Lemma |2l h ^ ■N'dCy x{x*){^) ^nd 
there exists d G dCy^\{x*) such that (d, h) > and 

Cy,x{x* +h)^ Cy^xix*) + {d, h) > Cy,x{x*). 

2) If /i G Gj, observe that q is strongly convex on Qj since 
^Hj\ holds. Hence, 

Cy^x{x*+h)>q{x*) + {Vq{x*),h)+X\\D*x*\\i+X{v, h). 

where v G (a;*) such that Xv + Vq{x*) — 0. 

Then, 

Cy^xix* + h)> Cy^xix*). 

In summary, for every /i G K^\{0}, £j^^a(2;*+^) > ^y,x{x*), 
and X* is the unique minimizer of Vx{y)- ■ 
The following lemma gives an implicit equation satisfied 
by a solution x* of the problem V\{y)- Note that Vxiu) may 
have other solutions. 

Lemma 4. Let x* a solution of Vx{y)- Let I be the D- 
support and J the D-cosupport of x* and s — sign{D*x*). 
We suppose that ^Hj) holds. Then, x* satisfies 

X* ^ A^^^'S>*y ~ XA^-'^Disi. (13) 

Proof: Using the first order condition (Lemma [U there 
exists a G J^y^x{x*) satisfying 

^*{<i>x* -y) + XDisi + XDja ^0. (14) 

By definition, one has x* G Qj so x* G (ImDj)^. Hence, we 
can write x* = Ua. Since U*Dj — 0, multiplying equation 
(O on the left by U*, we get 

U*^*{^Ua -y) + XU*Disi = 0. 

Since [/*$*$[/ is invertible, we conclude. ■ 

Lemma 5. Let y G and let J a D-cosupport such that 
jHj) holds, and I — J"^. Suppose x* satisfies 

where s = sign(Z?*i*). Then, x* is a solution ofVxiy) if> t^^^d 
only if, there exists a satisfying one of the following conditions 

(T -n^-'^sj + \n^-^^y eKerDj and ||cr||oo s$ 1, (15) 
A 

or equivalently, 

&-^^y - Xn^-'hi + XDja ^0 and \\a\\^ !^ 1, (16) 
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where = ~ld)Di, flM = - Id), 

f^I-^l = ZJ+fil'^l W nl-^l = D+nl-'l. Moreover; ||ct||oo < 1 
then X* is the unique solution of'P\{y). 

Proof: Remark that a;* is an element of Qj. According 
to Lemma [T] x* is a solution of V\{y) if, and only if, there 
exists fT G J^y^\{x*) such that 

- y) + AD/s/ + ADjcr = and ||ct||oc < 1- 

Since ^Hj) holds, one can define A^J\ We use the implicit 
equation ( fTSl l. 

$*($AM$*y _ A$A[^1l>/s/ - y) + Ai:>/s/ + AA/CT = 0. 
Factorizing the term in front of y and s/, one has 

- Id)y - A($*$A[''1 - Id)i:>/s/ + XDja = 0. 
which proves that 

fl^-'^y - Xn^^'^si + XDja = and ||o-||oo 1, 

One has U*fl = and thus one remarks that fil-'l = DjCl^-^^. 
Similarly, we define fll"'! such that Ifl'^l = Djfl^-^K Hence, the 
existence of cr G I]y,\{x*) such that \\(j\\oo ^ 1 is equivalent 
to 

Djcr = Djn^-'hi - ^DjU^'^Uj where ||cr||oo < 1, 
A 

which in turn is equivalent to 

<T - n^-'lsj + \u^-'^y e KeiDj where ||cr|oo s$ 1. 
A 

Replacing the inequality by a strict inequality condition gives 
the uniqueness of x* using Lemma [3] ■ 

A. Proof of Theorem Q] 

We recall that, according to Definition ID given some D- 
support / and Z3-cosupport J = we suppose that condition 
jHj) holds. Given some sign vector s G { — 1, +1}^, the anal- 
ysis Identifiabiltiy Criterion IC of a sign vector s associated 
to a D-support / is defined as 



IC(s) = min Si — u\\ 

ueKciDj 



where 



n^J] ^ 7:>+($*$a["'1 - id)Di. 

Proof of Theorem\J] The proof is done in three steps. 

1) We give a condition on A to have sign(-D*x*) = 
sign(i:)*a;o)- 

2) We give an other condition on ^^^j^ to ensure first-order 
condition on x* assuming IC < 1. 

3) We prove that the two conditions are compatible. 

We consider the vector defined by 

X* ^xo + - AA^D/s/, 

1. We first give a condition on A to ensure signs equality 

sign(D*i;*) — sign(£'*a:o) = s. 



Since A^'-^^*y = + A^'-'^^*w, signs equality is achieved if 

Vze/, \D*jXoU > \D}{x* - xo)U 

= \DjA^^^'P*w - XD*jA^-'^Disi\i. (17) 

We bound \\D*j{x* - xo)\U 
\\D*i (i* -xo)\\oo < \\D}A^'^\\ 

oo,oo 

m*w\\o, + X\\Djsi\\ 
Using operator norm inequalities, one has 

\\D*i [X* -Xo)\\oo ^\\D*jA^J^^^^\\^*\\2,oo\\w\\2 

+ X\\D*A^'^^^^\\Di\\oo,oo. 

Introducing 

T — min |_D|a::o|i > 0, 
»e{i,---,l-f|} 

the following condition 

T> P;AW|U,oo(||$* 12,00 1^12+ A||D/|U,oo), (18) 

ensures ( fTTb . 

2. We now give a condition on ^^^j^ to ensure first-order 
condition ( fTSI l assuming IC(sign(i)*xo)) < 1. Remark that 
Ily = IIw since xq G Gj- The minimum over Kei D j of 
s/-u||oo is reached for a given u G KerDj. We consider 
the following a defined by 

a = -u + r^t-^ls/ - ^Uw. 

A 

Using operator norm inequality, one has 

Moo ^ W^^'^SI - 

By definition of u, 

||(t||oo =^ IC(sj + '||2,oo 

Hence, under condition IC(sign(Z?*a;o)) < 1 and 



^l|n||2,oo||u;||2. 



^l|n[''i||2.oolkll2. 



\\m 



X 



< l-IC(sign(i?*.To)), 



(19) 



one has ||(t||oo < 1 and using Lemma |5] the vector x* is the 
unique solution of Vx{y). 

3. Let show that ( fTSI ) and ( fT9] ) are compatible. We introduce 
constants cj and cj : 



l|nW|h 



l-IC(signp*xo))' 



and 

Suppose that 
and 



D*jA^''^\ 



2,oo 



ID 



cj 



\M2 CJ_ 



I II oo,oo 



cj||w||2 < A < Tcj, 
Then ( fTSl ) and ( fT9b are satisfied. 
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B. Proof of Theorem |2] 

The proof of Theorem |2] is done in three steps. First, we 
specialize Theorem [T] when w = 0. Then, we show that under 
the condition IC(sign(Z3*xo)), the vector is a solution of 
■Po(y)- Finally, we prove Theorem |2] by considering an other 
potential solution of Vod/). 

Corollary 2. Let xq G be a fixed vector, I be 

its D-support, and y = ^xq. Suppose \Hj) holds and 
IC(sign(D*xo)) < 1. Then for A < c,j, 

X* = A^-'^^*y - XA^-'^Djsi where s = sign{D*xo)j. 

is the unique solution ofV\{y). 

Proof: Take w — in theorem [T] ■ 

Lemma 6. Let xq G M.^ be a fixed vector, I be its D-support, 
and y = ^xq. Suppose holds and IC(sign(£'*a;o)) < 1- 
Then xo is a solution ofVoiy)- 

Proof: According to Corollary |2] 'Px{y) has a unique 
solution for A < cj, 

X* = X* = .To - XA^-'^Disi 

Let ^ Xq such that $a;(i) = y- For every A strictly 
positive, one has Cy^\{x*) < £y^x{x(i)) by definition of x\. 
Then, 

||I?*.ta||i < 

Using continuity of norms, taking the limit A — s> in this 
equation gives 

||i^*a;o||i ^ \\D*x^r)\\i, 

which proves that a;o is a solution of Vo{y). ■ 
Proof of Theorem |2} Using Lemma |6] xq is a solution 
of Vo{y). We shall prove that xq is the unique solution. Let 
denote 

= + XA^-'^^Disi. 

Note that for A small enough, one has sign(_D*X(i)) = 
sign(£>*a;o). Hence, if IC(sign(I?*xo)) < 1, then Corrolary 
12] holds and xq is the unique solution of 'Px{yi) where 
2/1 = ^X(i). 



Proof of Theorem \3}i Consider the following restricted 
problem 



Let X 



pN 



(2) fc JK' such that $a;(2) = y with X(^2) ^o- Then 
^Xq = 3>a;(2) and since xq is the unique solution of 'P\{yi), 
one has 

- ^xq\1 + Ap*xo||i <\\y- ^x^^2)\l + AD*X(2)\\i. 

Then, 

\\D*X4^ < ||i?*T(2)||l, 

which gives uniqueness of the solution. ■ 

C. Proof of Theorem \3\ 

We recall that the Recovery Criterion RC of / C {1 . . . P} 
is defined as 

RC(/) = max min jn^'^^pi - u\\oo- 

IIpjIIooSSI ueKcrDj 



a.Tgmmhy~<t>x\\l + X\\D*x\\i. 



(Viiy)) 



Our strategy is to consider a solution of V'l{y), and showing 
that it is the unique solution of V\{y). To achieve this goal, 
we use four steps: 

1) We exhibit p} e RI^I such that 

U* -y) + XDip}] = 0. 

2) We prove that x* satisfies an equation of the form 

X* ^A^-^^<i>*y-XA^-'^Djp*j. 

3) We prove that x* satisfies the first-order condition of 
Lemma [T] using the construction of p*j. 

4) Finally, using operator norm inequalities we provide the 
bound announced in the statement. 

We rewrite V'lly) without constraints 

argmin \\\y - ^Uag + X\\D*Ua\l. 

1. Using Lemma [T] with $[/ and D*jU in place of $ and 
D*, if a* is a solution of V'l (y), then there exists cr* with 
If'^ioo ^ 1 such that 

U*<P*{<PUa* -y) + X{U*Di)i*si* + X{U*Di)j<.a* = 0. 

where /* C / is the D-support of Ua* and J* = [rf n /. 
We introduce G M'^I defined as 

[a* if I e J*, 

which satisfies 

Djp* = Dj.sj.+Dj.a\ 
First order conditions become 

U*[^*{<i>Ua* -y) + XDip*j] = 0. (20) 

2. Moreover, the condition \Hjj holds, so the matrix 
[/*(()*$[/ is invertible, so one has 

a* ^ {U*<P*<PU)-^U*<i>*y - X{U*<i>*m)-^U*Dip*j. 

Denoting x* = Ua* and multiplying both side by U gives 

X* = ^[•'l$*y - XA^-^^Dip*j. (21) 

3. We now prove that x* is a solution of V\{y), i.e there 
exists a such that 

$*($a;*-2/) + ADj.s* + ADjuj*CT = and ||ct|U 1. 
Consider u such that 



and 



u G argmin ||fi['''lj3j — u||c 

neKor Dj 



a = r^WpJ -u- -US-^w. 

\ 
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We recall that 

iil^l = - ld)Di, nl'^l = - Id), 

Remark that using equation ( |2TI ). one has 

$*($a;* - y) + XDip} + XDja 

+ XDip*j + XDjDjn^^^p*j 
- XDju-DjD+U-'^y 

=0 

= {ld-D.jD+){iV^'^\j-Xh^'^^p*j) 

= {ld-DjD+)['^*{'^x* -y) + XDip*j]. 

Let denote w = $*($a;* - y) + XDjp}. On one hand, 
multiplying the last expression by the pseudo-inverse Dj and 
using the fact that D^DjD^ — D^, one has 

[Id - DjD+)v eKaD+ ^ Gj. 

On another hand, using equation ( |20l i. we remark that w e 
Keri7*. Since KerJ7* = (ImC/)^ = ^j^, one has 

V e e| and (Id - DjDj)v £ ^j. 

However, DjD^v G t/^. Hence, 

(Id - DjDj) [$*($x* - y) + Ai?/pJ] eGjngj^ {0}. 

Hence, 

$*($a;* - y) + XDip*i + XDja = 0. 
Using operator norm inequality, one has 



||a|U < \\n^'^p*i-u\\^ + j\\u^'^\ 



2,oo 



By definition of m. 
Hence, 



min \\ni'^p*j~ul 

uGKcr Dj 



i||nW|| 



.Ikll 



||a||^ ^RC(/) + i||nM||2,^||i^||2. 



Hence, for RC(/) < 1, cr defined by 
Vje{l,...,P}\/, a, 



(t; if j e J* 
CT, if j e J, 



and 



A > ||w||; 



where cj = ||n['^l 



2,00 1 



1 - RC(/) 

one has || (7 1| oo ^ 1 || f7|| qq = inax( || fj || oo ? 1*7 || cxd 
Using Lemma [T] the vector x* is a solution of V\{y). 
Moreover, since \(j\oo < 1 and ^Hjj holds, x* is the unique 
solution of V\ (y) according to Lemma |3] 

4. We now bound the distance between xo and x*. 

\\x* - xoi = ||AW<D*y - AA^I^zp* - xo||. 

We remark that A^-^^^*y ^ xq + Hence, 

\\x*-xo\\ = \\A^-'H^*w - XDip*j)\\. 



Using operator norm inequality, one has 

iix*-xoIkiiaM||2.2||«^ii2 (\mk 



PCJ 



1 - RC(/) 



11^/11 



Conclusion 

This paper has provided a theoretical analysis of the ro- 
bustness of sparse analysis regularizations. We have studied 
the robustness to small and large noise. These contributions 
enable a better understanding of the behavior of this class of 
regularizations. 

Concrete examples illustrate our results. For discrete total 
variation, we show that staircasing induces an instability of the 
support, i.e discontinuties are not preserved. For Fused Lasso, 
our analysis shows that the support is stable and robust to an 
arbitrary bounded noise. 

A distinctive feature of our approach is that we look for the 
robustness of the cospace associated to the original data. This 
approach often has a meaningful interpretation (such as the 
conservation of discontinuities for TV-like models), however 
it also leads to quite restrictive conditions. A fascinating area 
for future work is to understand how to lift these restrictions 
to obtain sharper noise robustness of analysis regularization. 
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